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Abstract 

We study contractivity properties of gradient flows for functions on normed 
spaces or, more generally, on Finsler manifolds. Contractivity of the flows turns out 
to be equivalent to a new notion of convexity for the functions. This is different 
from the usual convexity along geodesies in non-Riemannian Finsler manifolds. As 
an application, we show that the heat flow on Minkowski normed spaces other than 
inner product spaces is not contractive with respect to the quadratic Wasserstein 
distance. 



1 Introduction 

The main goal of this article is to prove that, for the heat flow on a Minkowski normed 
space, no bound for the exponential growth of the L 2 -Wasserstein distance exists, unless 
the space is an inner product space. This is rather surprising, in particular, in view of the 
fact that the heat flow is the gradient flow in the L 2 - Wasserstein space V2 of the relative 
entropy and the fact that the latter is known to be a convex function on P 2 - m order to 
find an explanation for this phenomenon, we will first of all study the contraction of the 
gradient flow of a function on a Finsler manifold. A Finsler manifold is a manifold carrying 
a Minkowski norm on each tangent space, instead of an inner product for Riemannian 
manifolds. A Minkowski norm is a generalization of usual norms, and is not necessarily 
centrally symmetric. We will always assume that a Minkowski norm is strongly convex 
(and in particular strictly convex, see Subsection 12.11 for the definition). 

In Riemannian manifolds, given K £ R, it is well-known that the A-convexity of 
a function / along geodesies 7 (i.e., (/ o 7)" > A|7| 2 in the weak sense) implies the 
K-contraction of the gradient flow of /, namely 

d(m,C(t)) <e-**d(£(0),C(0)) 

holds for all t > and £, C solving £(t) = V(-/)(£(i)), CW = V(-/)(C(*)). This is 
obtained via the first variation formulas for the distance d(£(t), ({t)) and the function 
/. In Finsler manifolds or even in strictly convex normed spaces, however, it has been 
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unclear whether the gradient flows of convex functions are contractive (cf. [ AGS] Intro- 
duction]). To avoid trivial counter-examples, of course strict convexity must be imposed 
(see Remark 14. ip . 

The point is that, although the aforementioned first variation formulas do exist also 
in the Finsler setting, they use different approximate inner products (see the paragraph 
following Definition 13. ip . Keeping this in mind, we introduce a new notion of convex- 
ity, called the skew convexity, which is equivalent to the usual convexity in Riemannian 
manifolds. We show that the K-skew convexity of a function on a Finsler manifold is 
equivalent to the ^-contraction of its gradient flow (Theorem 13. 2p . A difference between 
the skew convexity and the convexity along geodesies is observed by considering distance 
functions. In Minkowski spaces, the squared norm is always 2-skew convex, while in gen- 
eral it is only i^-convex for some K > 0. We also construct an explicit example of a 
convex function which is not 0-skew convex. This negatively answers the above question 
in [AGS] (see Section H] for details). 

In the second part of the article, we apply our technique to the heat flow on Minkowski 
spaces. Due to the celebrated work of Jordan et al [JKQ], the heat flow on Euclidean 
spaces can be regarded as the gradient flow of the relative entropy in the L 2 - Wasserstein 
space. This provides a somewhat geometric interpretation of the non-expansion (0- 
contraction) of heat flow with respect to the Wasserstein distance, as the relative entropy 
is known to be convex along Wasserstein geodesies (also called displacement convex, [Mc] ) . 
More generally, on Riemannian manifolds, both the if-convexity of the relative entropy 
and the .fC-contraction of heat flow are equivalent to the lower Ricci curvature bound 
Ric > K ( |vRSj ). Note that the Wasserstein space over a Riemannian manifold possesses 
a sort of Riemannian structure, for which the first variation formulas are available (see 
[Otj . [AGSj . [Vi] . [Er]). We also remark that Gigli [GiJ recently showed the uniqueness of 
the gradient flow of the relative entropy (with respect to a probability measure) for met- 
ric measure spaces such that the relative entropy is i^-convex for some ifsl, without 
relying on the contractivity. 

In our previous works [Oh3] , [OSlj , we have extended the equivalence between the Ricci 
curvature bound and the convexity of the relative entropy, as well as the identification of 
(nonlinear) heat flow with the gradient flow of the relative entropy with respect to the 
reverse Wasserstein distance, to Finsler manifolds. In particular, the relative entropy on 
any Minkowski space is convex (see also [VTl page 908]). Then it is natural to ask whether 
the heat flow on Minkowski spaces is contractive or not (see also the fourth comment in 
[Gi] Section 5]). Our main result gives a complete answer to this question. 

Theorem 1.1 The heat flow on a Minkowski normed space (W 1 , || • ||) is not K-contractive 
with respect to the reverse L 2 -Wasserstein distance for any K £ R ; unless (lR n , || ■ ||) is 
an inner product space. 

Our proof uses a geometric characterization of inner products among Minkowski norms 
(Claims EH El . 

Theorem 11.11 means that the Wasserstein contraction implies that the space must be 
Riemannian. This makes a contrast with the aforementioned fact that the convexity 
of the relative entropy (more generally, the curvature-dimension condition) works well 
for general Finsler manifolds. Among other characterizations of lower Ricci curvature 
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bounds for Riemannian manifolds, we recently verified that the Bochner-Weitzenbock 
formula makes sense for general Finsler manifolds ( |OS2j ). 

The article is organized as follows. After preliminaries for Minkowski and Finsler 
geometries, we introduce the skew convexity in Section [31 and study the skew convexity 
of squared norms of Minkowski spaces in Section HI In Section [5J we discuss the heat 
flow on Minkowski spaces. We give a detailed explanation on how to identify it with 
the gradient flow of the relative entropy, because some results in [OSlj are not directly 
applicable to noncompact spaces. Section [H] is devoted to a proof of Theorem ll.il Finally, 
we consider the skew convexity of distance functions on Finsler manifolds in Appendix. 

2 Preliminaries 

We review the basics of Minkowski spaces and Finsler manifolds. We refer to [BCS] and 
|Sh] for Finsler geometry, and to |BCS|. Chapter 14] for Minkowski spaces. 

2.1 Minkowski spaces 

In this article, a Minkowski norm will mean a nonnegative function || • || : M n — > [0, oo) 
satisfying the following conditions. 

(1) (Positive homogeneity) \\cx\\ = c\\x\\ holds for all x G M n and c > 0. 

(2) (Strong convexity) The function || • || 2 /2 is twice different iable on IR n \ {0}, and the 
symmetric matrix 

few):,- : = S m 

is measurable in x and uniformly elliptic in the sense that there are constants A, A > 
such that 

n n n 

A E( fll ) 2 ^ E toW™ < A 5> ? ) 2 ( 2 - 2 ) 

holds for all {0} and (d L ) e W 1 (in particular, > for all x ^ 0). 

We call (M. n , || ■ ||) a Minkowski (normed) space. We remark that the strong convexity 
implies the strict convexity, i.e., < \\x\\ + \\y\\ unless x and y are linearly dependent. 

Note that the homogeneity is imposed only in positive direction, so that ||— x\\ ^ \\x\\ is 
allowed. We also remark that the function || ■ || 2 /2 is twice differentiable at the origin only 
in inner product spaces. Given x e lR n \ {0}, the matrix (12. ip defines the inner product 
g x of W 1 by 

n 

g x {(ct),(V)) := Y^g^dV. (2.3) 

This is the best approximation of the norm || ■ || in the direction x in the sense that the 
unit sphere of g x is tangent to that of || • || at a;/||x|| up to the second order (Figure 1). In 
particular, we have g x (x,x) = \\x\\ 2 . If the original norm comes from an inner product, 
then g x coincides with it for all x. 
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Figure 1 



We define the 2-uniform convexity and smoothness constants C,5 G [1, oo) as the least 
constants satisfying 

2 



x + y 



2 

x + y 



^ II 1 1 2 i 1 || ||2 1 || ||2 

< - \\x\\ H — \\y\\ — X - H , 
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for all x,y £ 1R™. In other words, C 2 and 5 2 are the moduli of convexity and concavity 
of || • || 2 /2, respectively Thanks to (Q, C < oo and 5 < oo hold. Indeed, we know 

su H c 9x(y,y) 1/2 

c, V £R"\{o} 9x{y,y) 112 ' 



C 



s 



sup 

i,y€R"\{0} 



y\ 



(2.4) 



1 or S = 1 holds if and only if the norm 



(cf. [Qh2l Proposition 4.6]). Note also that C 
is an inner product. 

Denote by || • ||* the dual norm of || • ||. Then the Legendre transform C : (R n , || • ||) — > 
(W n , || ■ II*) associates x with C(x) satisfying ||£(x)||* = ||a;|| and [£(x)](a;) = ||a;|| 2 . Note 
that f)2.2p ensures that C(x) is indeed uniquely determined. Moreover, C(x) = (Cj(x))^ =1 
can be explicitly written as 

1 2^ n 

{x) = ^2g ij {x)x i . (2.5) 



Cj(x) = - 



dxi 



i=i 

J" I 



The Legendre transform of inverse direction C* : 
the inverse map C* = C^ 1 by definition. For a function / : M. n — y 
f is different iable, we define the gradient vector of / at x by V/(x) 
(identified with lR n ). 



l , || • ||) is nothing but 
and x G M n at where 
= C*(Df(x)) e T x R n 



Remark 2.1 We need the strong convexity of the norm to formulate and investigate 
the skew convexity of functions as well as the heat equation, while the characterization 
of inner products (Claim l6\2|) is valid among merely 'convex' Minkowski norms (i.e., its 
closed unit ball is a closed convex set containing the origin as an inner point). In addition, 
the strict convexity will be a necessary condition when one studies the contractivity of 
gradient flows (see Remark 14. ip . 
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2.2 Finsler manifolds 



Let M be a connected C°°-manifold without boundary. A nonnegative function F : 
TM — y [0, oo) is called a C°° -Finsler structure if it is C°° on TM \ {0} ({0} stands for 
the zero section) and if F\t x m is a Minkowski norm for all x G M. We call (M,F) a 
C^-Finsler manifold. (We will consider only C°°-structures for simplicity.) 

For each v G T X M \ {0}, we define the inner product g v on T X M according to (12.31) . 
That is to say, given a local coordinate (V)" =1 on an open set U containing x, we consider 
the coordinate of T X M as v = YH=\ v l (d / dx l )\ x and define 

1 d 2 (F 2 ) ( n d l n d \ n 

We denote by C(x) and <S(a;) the 2-uniform convexity and smoothness constants of F\t x m- 
For a function / : M — y K differentiable at x G M, define the gradient vector of / at x 
by V/(x) := C*(Df(x)) via the Legendre transform C* : T*M — y T X M. 

The distance from x to u is naturally defined as d(x,y) := inf 7 F(j) dt, where 
7 : [0, 1] — y M runs over all differentiable curves from x to y. We remark that d is 
nonsymmetric in general, namely d(y,x) ^ d(x,y) may happen. A geodesic 7 : [0,/] — y 
M is a locally length minimizing curve of constant speed (i.e., -^(7) is constant). We say 
that (M, F) is forward complete if any geodesic 7 : [0, 1] — y M is extended to a geodesic 
7 : [0, 00) — y M. Then, for any x, y G M, there is a minimal geodesic from x to y by the 
Hopf-Rinow theorem ([BCS] Theorem 6.6.1]). 

Along a geodesic 7 : [0, /] — y M, 7(3) with s G (0, Z) is called a cut pomi of 7(0) 
if 7|[o )S ] is minimal and if 7|[o, s + e ] is not minimal for any e > 0. Suppose that 7(5) is 
not a cut point of 7(0) for all s G (0,/], and let £ and £ be differentiable curves with 
£(0) = 7(0) and £(0) = 7(/). Then we have the following first variation formula QBCS, 
Exercise 5.2.4]): 

cm - dim, cm g^Wh cm - gmw®, m) (96] 

80 t Z-i ■ rf( 7 (0),7(Z)) ' 1 ' } 

As usual in discussing the contraction property, this formula will play a vital role. 

It is sometimes useful to consider the reverse Finsler structure F(v) := F(—v). We 
will put an arrow 4- on those associated with F, for example, d(x,y) = d(y,x) and 

V/ = -v(-/). 



3 Skew convex functions 

We introduce the skew convexity of functions on a C°°-Finsler manifold (M, F), and will 
see that it is equivalent to the contractivity of their gradient flows. Although we shall work 
with C 1 -functions for simplicity, the same technique is also applicable to other classes of 
functions (e.g., locally semi-convex functions, see Remark 13.31 below) . 

Let us begin with the standard notion of convexity along geodesies. A function / : 
M — y [—00, 00] is said to be K -convex (or geodesically K -convex) for K G R if 

/( 7 (t)) < (1 - «)/(7(0)) + tf {!{!)) - f (1 - t)td( 7 (0),7(l)) 2 
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holds for all geodesies 7 : [0, 1] — > M and t £ [0, 1]. If / is C 2 , then this is equivalent to 
d 2 [f o 7]/<9t 2 > Kdfrip), 7(1)) 2 and to 

§- t [Df{l(t)){i(t))] >^(7(0),7(1)) 2 - 

Now, instead of Df(j)(j) = — ^v(-/)(7)(V(— /)(7), 7) in the left hand side, we employ 
-£( 7 )(V(-/)( 7 )) = -^(V(-/)(7),7) for the skew convexity. 

Definition 3.1 (Skew convex functions) Let / : M — > 1 be a C 1 -function. We say 
that / is K-skew convex for K £ R if , for any pair of distinct points x, y £ M, there is a 
minimal geodesic 7 : [0, 1] — > M from x to y such that 

0K1)(t(1).V(-/)(j/)) (7(0), V(-/)(a:)) < -Kd(x,y) 2 . (3.1) 

Recall that, on a Riemannian manifold (M, g), it holds g-y = g and (13.11) indeed implies 
the iT-convexity of /. In the Finsler setting, however, is different from <7v(-/)(7)- 

For a C 1 -function / : M — > R and any point x £ M, there exists a C 1 -curve 
£ : [0, 00) — > M satisfying f (0) = x and £(t) = V(-/)(f (i)) for all t. We call such £ a 
gradient curve of /. For £ R, we say that i/ie gradient flow of f is K -contractive if 

d(mx(t))<e~ Kt d(m,ao)) 

holds for all gradient curves £, £ and t £ [0, 00). Comparing (13.11) with (12. 6p . we verify 
that the if-contractivity is equivalent to the i^-skew convexity. 

Theorem 3.2 Let (M,F) be a forward complete Finsler manifold, and let f : M — > R 
be a C 1 -function. Then the gradient flow of f is K -contractive if and only if f is K-skew 
convex. 

Proof. We first assume that / is i-T-skew convex. Fix two gradient curves £, ( : [0, 00) — > 
M of / and set l(t) := d(£(t), C(£)). Given t > 0, let 7 : [0, 1] — > M be a minimal geodesic 
from £(t) to £(£) such that (13. ip holds. Note that 7(1/2) (C(t), resp.) is not a cut point 
of £(t) (7(1/2), resp.). Thus the first variation formula (I2.6P shows that, together with 
the triangle inequality, 

l im sup l (t + e)-l(t) < ^ d(gt + e), 7(1/2)) -jgg), 7 (l/2)) 
e4-o e _ £ ;o g 

| lim d( 7 (l/2),C(t + e))-d(7(l/2),C(t)) 

£4.0 e 

= -<?7(o) (7(o)/iw,ew) +07d) (7(1) A(*),c(0)- 

By hypothesis, this yields /'(£) < —Kl(t) a.e. t. Therefore we deduce from Gronwall's 
theorem that d(f (*),£(*)) < e~ x *d(£(0), C(0)) holds. 

To see the converse, suppose that the gradient flow of / is .^-contractive and take a 
minimal geodesic 7 : [0, 1] — > M. Dividing 7 into 7 1 [0,1/2] and 7 1 [1/2,1] if necessary, we 
can assume that 7(5) is not a cut point of 7(0) for all s £ (0, 1]. Consider gradient curves 
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CC : [0,oo) — > M of / with £(0) = 7 (0) and C(0) = 7 (1), and put l(t) := d(£(t)X(t)) 
again. Then it follows from the assumption that 

4- \e Kt l(t)} < 0. 
dt t=o+ 1 WJ ~ 

This immediately implies the i^-skew convexity, as the first variation formula (12. 6 p shows 



d 
dt 



d 



[e K %t)] = Kl(0) + - l(t) 



t=o+ dt 



t=o+ 



= Kl(0) + gj(i) (7(l)/i(0), C(0)) - g m (7(0)/i(0), ^(0)) . 

□ 

Remark 3.3 We can replace the C 1 -regularity in Definition 13.11 with the local semi- 
convexity as follows (cf., e.g., |Ly| , |Ohl] for details). We say that a function / : M — > E 
is locally semi-convex if, for any x G M, there are an open set U 3 x and ifsl such 
that f\ u is fT-convex along any geodesic 7 : [0, 1] — > U . Define the local slope of / at 
x G M as 

ivy t\l \ v max{/(a:) - f(y),Q} . . 

\V-f\(x) := hmsup r . (3.2) 

y ^ x d{x, y) 

For each x G M with \SLf\(x) > 0, there exists a unique unit vector v G T X M satisfying 
lim t ^o{/ (7(£)) — f(x)}/t = —\V-f\(x), where 7 is the geodesic with 7(0) = v. We define 
\Lf(x) := \V-f\(x) ■ v, and \Lf(x) := if |V_/|(x) = 0. Then, from any initial point, 
there starts a gradient curve £ solving £(£) = V./(£(t)) a.e. £. The i^-skew convexity can 
be defined by using \Lf instead of V(— /) in (13. II) . and the analogue of Theorem 13.21 holds 
by the same argument. 



4 Skew convexity of squared norms 

We study the skew convexity of the squared norm of a Minkowski space, and compare it 
with the usual convexity along straight lines. The more general case of distance functions 
on Finsler manifolds will be treated in Appendix. 

Let (M™, || • ||) be a Minkowski space and set f(x) := || — x|| 2 /2. Observe V(— f)(x) = 
—x, so that the gradient curve £ of / with £(0) = x is given by £(£) = e~ l x. Thus we see 
that the gradient flow of / is 1-contractive, which shows that / is 1-skew convex. This 
can be proved also by a direct calculation as, for any v G M n \ {0}, 

g v (v,V(-f)(x + v)) - g v (y, V(-/)(x)) = -g v (v,v) = -\\v\\ 2 . 

Remark 4.1 The above example obviously requires the strict convexity of the norm. In 
fact, even the uniqueness of gradient flows fails in non-strictly convex normed spaces. As 
pointed out by the referee, a typical example is the same function f(x) = HxH^/2 in the 
2-dimensional foo-space (M 2 , || • ||oo). Any curve £(£) = (e~*,/i(i)) satisfying \h(t)\ < e~ l 
and |/i'(t)| < e - ' is a gradient curve of / in the metric sense of |AGSj . 
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In contrast to the 1-skew convexity of / above, we can find a norm || • || of IR n such 
that the function f(x) := (x,x)/2 associated with the Euclidean inner product (•, •) is 
not even 0-skew convex, although / is convex along straight lines. To see this, we observe 
Df(x) = x and V(-/)(a?) = C*(-x) by identifying both T x R n and T*R n with W 1 . 
Choosing v = —x ^ 0, we have 

g v (v,V(-f)(0))-g v (v,V(-f)(-v)) = -g v (v,C*(v)) = -{C{v),C\v)), 

where we used (12. 5p in the second equality. However, —(C(v), C*{v )) /\\v || 2 can be positive. 
An example is illustrated in Figure 2, where we set v = (0, 1) G M 2 . The parallelogram 
rounded to be strictly convex is the unit sphere of the norm || • ||. Note that (C(v),v) = 
\\v\\ 2 = 1, (v,C*(v)) = \\C*(v)\\ 2 and that (C(v),C*(v)) < 0. 









C(v) 


^ 






V / 







Figure 2 



Given any K < 0, by scaling / (or the inner product) with sufficiently large C > 0, 
the convex function Cf is not K-skew convex. This observation reveals that the skew 
convexity has no (obvious) relation with the usual convexity. In addition, via Theorem 13 .21 
we have seen that the usual convexity does not imply the contractivity in non-Euclidean 
normed spaces. This answers the question in |AGSj quoted in the introduction. 

5 Heat flow on Minkowski spaces 

In order to apply our technique to the heat flow on Minkowski spaces, we regard it as 
the gradient flow of the relative entropy with respect to the reverse Wasserstein distance. 
We refer to [AGS] and [Vi] for Wasserstein geometry as well as the gradient flow theory. 
Throughout the section, let (lR n , || • ||) be a Minkowski space in the sense of Subsection 12. 11 

5.1 Wasserstein geometry over Minkowski spaces 

Let V{R n ) be the set of Borel probability measures on R". Define V 2 {W l ) C V(R n ) as the 
set of measures // satisfying J Rn ||x|| 2 d\i < oo (note that then J Rn || — x\\ 2 d\i < oo holds as 
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well). The subset of absolutely continuous measures with respect to the Lebesgue measure 
dx will be denoted by P 2 ac (R n ) C V 2 (W l ). 

Given //, v G ^(R™), a probability measure 7r G P(R n x ]R n ) is called a coupling of 
and z/ if vr(A x R n ) = fj,(A) and vr(R n x A) = u(A) hold for all Borel sets A C R n . We 
define the L 2 -Wasserstein distance from to v by 

^) := inf I / 1 1 2/ - x|| 2 <i7r(x, y) 

where 7r runs over all couplings of \x and v. A coupling 7r attaining the infimum above is 
said to be optimal. We call (P2(R n ), W 2 ) the L 2 -Wasserstein space over (R n , \\ ■ ||). 

Remark 5.1 (a) Thanks to ( 12. 2p . our norm is comparable to an inner product. In fact, 
( 12. 4 p yields C _1 ||?/|| < \J g x (y, y) < «S||y||. Then, if we denote by the Wasserstein 
distance with respect to g x , we have C~ l W 2 (n, v) < W^in, v) < SWi{\i, v). This relation 
is sometimes useful to apply known results in the Euclidean case. 

(b) The least constant c > 1 satisfying ||?/|| 2 < (y,y) < c||y|| 2 for some inner product 
(-, •) and all i/el™ can not be bounded only by the dimension n, unlike John's theorem 
for symmetric norms (c < n). For instance, consider the norm whose unit sphere is the 
standard unit sphere, but with the center (1 — e, 0, . . . , 0). Letting e 4. 0, we have c — > 00 
(and C, S — > 00). 

For \x G T-^R") and v G 7-2 (R n ), there exists a semi-convex function ip on an open set 
flcR" with /i(fi) = 1 such that 7r := (id]gn xTi)j// provides the unique optimal coupling 
of fi and v, where we set T t (x) := x + tV<p(x) for t G [0, 1] (by, e.g., [V2 Theorem 10.26] 
under the conditions (locLip), (SC), (Hoo)). Moreover, fi t : = (T t )^fj, is the unique 
minimal geodesic from /i = fi to \L\ = v. Note that <p is twice differentiable a.e. on Q in 
the sense of Alexandrov, thus T t is well-defined and differentiable a.e. on Q. 

We introduce a Finsler structure of the Wasserstein space along the line of [Otj . see 
[OS I] for more details in the case of compact Finsler manifolds. We set 

TV := {$ = V(f I <p G C c °°(R ri )} 

and define the tangent space (T^V^F^) at \i G ^(R™) as the completion of TV with 
respect to the Minkowski norm 

F?(*):= ( [ \\n 2 d^ 

(of the space of measurable vector fields $ with < 00). Similarly, the cotangent 

space (T*V,F*) is defined as the completion of f*V := {a = Dip\cp G C™(R n )} with 
respect to F*(a) := (J K „ ||a|| 2 dfj,) 1 ^ 2 . We define the Legendre transform £* : T^P — > 
Tf/P in the pointwise way that £*(D<p) = V(p. 

We say that a curve (^t)tei C 7-2 (R") on an open interval / C R is (2-) absolutely 
continuous if there is some function /i G £ 2 (/) such that 

W 2 (Mt,Mr) < J h(r)dr 
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holds for all t, r G / with t < r. Note that an absolutely continuous curve is continuous. 
Given an absolutely continuous curve (fi t ) teI , the forward absolute gradient (or the metric 
speed) 

i. I _ i. (/^min{r,t} ) A*max{r,t}) 

/•tt ; — rim ■ 

r^t |r — t| 

is well-defined for a.e. t G J ( |AGSj Theorem 1.1.2], |QS1| Lemma 7.1]). We can associate 
{fk)tei whh a Borel vector field $ on 7 x W 1 (with <3>i(x) := ®{t,x) G T^M") satisfying 

• $ f G T Mt P at a.e. t G I, 

• the continuity equation d t fit + div($tfk) = in the weak sense that 

{dtip + Dip($)} d/ndt = (5.1) 

for all ip G C C °°(J x M ra ) ( |SG51 Theorem 8.3.1], [0511 Theorem 7.3]). 

Such a vector field $ is unique up to a difference on a null measure set with respect to 
dfifdt, and we have i^ t ($t) = |/if| a.e. t G J. We will call $ the tangent vector field of the 
curve (pt)tei and write fi t = ^t- 

Now, consider a function Q : 7- > 2 (^ n ) — ► [ — 00 j °°]- We say that Q is differentiable at 
/i G P2(K n ) if — oo < Q{n) < oo and if there is some a G T*V such that 

a(S) rf/i > lim sup Qifit) ~ Qifi) (5.2) 
t^o t 

holds for every minimal geodesic (fJ>t)te[o,i] with \i t = (T t )$/J, and T t (x) = x + t$(x), and 
that equality holds in f)5.2p if $ G TV (with lim^ in place of limsup^ ). Such a 
one-form a is unique in T*V up to a difference on a /z-null measure set. Thus we write 
DQ(fj,) = a and define the gradient vector of Q at /i by VwQ(^) : = £*u{DQ{[i))- 

Definition 5.2 (Gradient curves in (7^2 (R n ), W^)) We say that an absolutely contin- 
uous curve (/it) t6 [o,T) C "p20R n ) with T G (0, oo] is a gradient curve of Q if (it = 
^w{-Q){^t) holds'for a.e. t G (0,T). 

We remark that the differentiability of — Q at a.e. t G (0, T) is included in the above 
definition. 

5.2 Nonlinear heat equation and global solutions 

We define the (distributional) Finsler Laplacian A acting on u G H^JW 1 ) by 

ipAudx = — / Dil){Vu)dx 



for all ip G C^°(R ?1 ). Note that A is a nonlinear operator since the Legendre trans- 
form is nonlinear (unless the norm || ■ || comes from an inner product). We consider 
the associated heat equation d t u = Au also in the weak form. We have seen in [OSll 
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Theorem 3.4] that, given uq G Hq(W"-) and T > 0, there exists a unique global solution 
u G L 2 ([0, T], #o(M n )) n ^([0, T], L 2 (M)) to = Am in the weak sense that 

ipd t udx = — / Dip(Vu) dx (5.3) 



holds for all t G [0,T] and $ G #o( M ™)- 

We can also regard (iit)te[o,T] as a weak solution to the heat equation d t v = A Vu t> 
associated with the linear, second order, time-dependent differential operator 

AV»,:=div(£ 9 «(V M )jyy, (5.4) 

where (g* J ) stands for the inverse matrix of {gij) and Vu(x) is replaced with some nonzero 
vector if Vu(x) = (in a measurable way). By virtue of (I2.2p . (g 8J (V«)) is globally 
uniformly elliptic with respect to the Euclidean inner product. Therefore the classical 
theory due to Nash [Naj . Moser [MoJ, Aronson |Arj and others yields the parabolic Harnack 
inequality as well as the Gaussian estimates from both sides for fundamental solutions 
(see also [Sal] for the Riemannian case). Moreover, the continuous version of u is H 2 oc in 
x and C l ' a on (0, oo) x W 1 f fOSTl Theorems 4.6, 4.9]). 

The following lemma allows us to consider (ut dx)t>o as a curve in P2(^ n ). 

Lemma 5.3 Let (ut)t>o C //o(]R n ) be a global solution to the heat equation. Then we 
have the following. 

(i) (Mass preserving) Ifu alx G V(R n ), then u t dx G V{W n ) for all t > 0. 

(ii) Ifu dx G V 2 (W l ), then u t dx G V 2 (W l ) for all t > 0. 

(hi) (Continuity in W 2 ) Ifu dx G P 2 (K"), then \im t ^ ) .QW2(u dx,Utdx) = 0. 

Proof, (i) This easily follows from the existence of the fundamental solution q u to the 
equation d t v = A Vu t>. Precisely, u t (x) = f„ n q u (t, x; 0, y)u (y) dy and L n q u (t, x; 0, y) dx = 
1 imply / Rn u t dx = 1. 

(ii) By virtue of the upper Gaussian bound (cf. [Sail Corollary 6.2]), we have 
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q u (t } x;0 } y) < dt- n/2 exp 



\x - y\ 



C 2 t 

where | ■ | stands for the Euclidean norm and C\, C 2 depend only on || • ||. Thus we obtain 
x\\ 2 u t (x)dx< / / 2(\\x-y\\ 2 + \\y\\ 2 )q u (t,x]0,y)u (y)dydx 



<2C l r n ' 2 I / ||x-y|| 2 exp (- )u (y) dxdy + 2 / \\y\\ 2 u (y) dy 



" ./ICP" 



< C 3 t + 2 / \\y\\ 2 u {y) dy < oo, 



11 



where we used the fact se~ s < e~ s l 2 in the last inequality and C 3 depends only on Ci, C 2 , 
the norm || • || and n. 

(iii) It suffices to show that u t dx weakly converges to u dx and 



lim limsup / \x\ 2 Ut(x) dx = 

R->oo t ^o J{\x\>R} 

by Remark 15.1( a) (cf. [Vi, Theorem 6.9]). For the weak convergence, thanks to (i) 
and [AG51 Remark 5.1.6], it is sufficient to show the convergence for test functions 
/ G C£°(R n ). This immediately follows from (15. 3p . indeed, the Cauchy-Schwarz inequality 
yields 

f / fd t udxdt <( T f \\Df\\ldx) ( T f \\Vu t fdxdt] ' ^0 

as t tends to zero. The latter condition can be seen similarly to (ii). As \x\ < 2\x — y\ if 
\x\ > R and \y\ < R/2, we have 

/ \x\ 2 u t (x)dx< / / 2(\x - y\ 2 + \y\ 2 )q u (t,x; 0,y)u o (y) dydx 

J{\x\>R} J{\z\>R} J{\y\>R/2} 

+ / / 4|x-y|V(t,x;0,?/)wo(y)*/Gte 

J{\x\>R} J{\y\<R/2} 



<Ct + 2 [ \y\ 2 u (y) dy ^ 

JUv\>R/2\ 



'{\y\>R/2} 

as t — > and then R — > oo. □ 

5.3 Relative entropy and heat flow as its gradient flow 

We define the relative entropy (with respect to the Lebesgue measure) by 

Ent(/i) := / plogpdx G (—00,00] 

for p = pdx G V$ c {R n ), and Ent(» := 00 for p G V 2 {R n ) \ P 2 ac (M n ). See jJKOl Section 4] 
(and Remark 15.1( a)) for the fact Ent > —00 on V^iR 11 )- We know that Ent is convex 
along geodesies in (P 2 (^ n ), W%) (|Vil page 908]). There is a well established theory on 
gradient flows of such convex functionals, for which we refer to [AGS]. Here we explain that 
a global solution to the heat equation gives the gradient flow of the relative entropy along 
the lines of |OSlj . |AGS] and [Er]. The next lemma corresponds to [OSl|. Proposition 7.7], 
see also |AGS| Theorem 10.4.17] and [Er] Proposition 4.3]. 

Lemma 5.4 For p = pdx G P| c (lR n ) with p G if^R") and Ent(p) < 00, the following 
are equivalent. 

(I) — Ent is differentiable at p. 

(II) \\V(-p)\\/ P eL 2 (R^p). 
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Moreover, then V(— p)/p G T^V and we have 

Vw (-Ent)(/i) = ^t^. 

P 

Proof. (I) =>• (II): It suffices to see that |V_ Ent \(p) defined as in (13. 2 p is finite, then [ErJ 
Proposition 4.3] yields (II) because our norm is comparable to a Euclidean norm. Take 
v G V^iJSJ 1 ) with Ent(z/) < oo and let p t = {Tt)$p with T t (x) = x + t§(x) be the minimal 
geodesic from p = p to pi = z/. We deduce from the convexity of Ent (/if) that 

Ent(u) - Ent(u t ) Ent(u) - Ent(u) 

limsu P 7771 \ ^ ti77 \ • 

t^o W 2 {p,Pt) W 2 {p,u) 

Thanks to the differentiability of — Ent, we observe 

lim S up Ent(//) ~ Ent(Mt) < / [D(-Ent)( f i)](<l>)dfi<F fl (V w (-Ent)(fi))-W 2 (fi,u). 

Therefore we have |\7_Ent |(^) < i*)j(Vw(— Ent)(/i)) < oo. 

(II) =>■ (I): Again due to [EH Proposition 4.3], we obtain |V_ Ent \(p) < oo as well as 
D(—p)/p G T*V. For any minimal geodesic (pt)te[o,i\ as in (I) =>- (II), [VH Theorem 23.14] 
(see also |OSl[ Proposition 7.7] for the compact case) shows that 



Ent (it) — Ent(itt) , , , , 
lim -^~ t = / div ($) ^ < 



/ 


rz>(- P )-| 


/R™ 





($) d/x, 



and equality holds if $ G TV. Thus — Ent is different iable and it holds Vw{— Ent)(//) = 
V(-p)/p. ' □ 

The following theorem is a slight modification of [OS1, Theorem 7.8] adapted to non- 
compact spaces. For the sake of simplicity, we are concerned with the reverse heat equa- 
tion, that is the heat equation with respect to the reverse norm ||o;||<_ := || — x\\. Since 
^7u = — V(— u), we can write it as 



/ ipd t udx = - Dtfj(^u)dx= / Dip(y(-u)) dx. 
Jm n Jm.™ JR n 



(5.5) 



Theorem 5.5 (Heat flow as gradient flow) (i) Let (pt)t>o C H^W 1 ) be a global 
solution to the reverse heat equation with podx G V^iW 1 ). Then m := p t dx is a 
gradient curve of the relative entropy {in the sense of Definition s^ . 

(ii) Conversely, let (pt)t>o C V 2 c (M. n ) be a gradient curve of the relative entropy, put 
p t = p t dx and assume that p t G Hq(W 1 ) for a.e. t. Then p t is a global solution to 
the reverse heat equation. 

Proof, (i) We first of all remark that Ent(p t ) < oo for all t > by the upper Gaussian 
estimate for fundamental solutions (see the proof of Lemma f5.3f ii)). It follows from the 
reverse heat equation (15.51) that V(— p)/p satisfies the continuity equation (15.11) along the 
curve (pt)t>o- More generally, we have 



/ ^ T dp T - I i/; T dp T = [ [ Sd t 4> + Dij(^-^))dp t dt 

JR n JR n Jt JR™ I \ Pt / J 



(5.6) 
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for all ip £ C£°([0> oo) x R") and < r < T < oo. Then we obtain, by choosing a test 
function ip approximating log(max{p t , e}) — \oge on [t, T] x R n and then letting e \. 0, 

/ / iZL^li! rf^i = Ent(/i r ) - Ent(// T ) < oo. 

Jt JR n Pt 

Hence (pt)t>o is absolutely continuous (see [AGS, Theorem 8.3.1]) and, as T > was 
arbitrary, we find || V(— pt) \\/pt G L 2 (R n ,p t ) at a.e. t. This implies that — Ent is differ- 
entiate at p t and VV(— Ent) (/•**) = V(— Pt)/pt £ ^m*^ 3 a,e - ^ (Lemma 15.4)) . Combining 
V(—p t )/pt £ T^T 3 with ( 15. 6p . we conclude /i 4 = V(—pt)/pt and thus (pt)t>o is a gradient 
curve of Ent in the sense of Definition 15.21 

(ii) Note that Lemma EU ensures p t = Vjy(- Ent)(yUt) = V(—pt)/pt for a.e. i. Then 
the continuity equation with $ = V(— immediately implies the reverse heat equation. 
□ 

Remark 5.6 The formula Vw(— Ent)(/i) = V(— p)/p in Lemma [5T41 has an extra impor- 
tance in the Finsler /Minkowski setting. The reverse heat equation ( 15 .5p is rewritten via 

the integration by parts as L„ ipd t p dx = L„ A V( ^ _P ^ p (in other words, Ap = A v ( _p )p, 
see (15 .4p ). Then the homogeneity fi'v(-p) = 9v(-p)/p guarantees that a formal calculation 
with respect to the time-dependent Riemannian structure <?v(-p) verifies Theorem 15.51 



5.4 Skew convexity and Wasserstein contraction 

To show an analogous result to Theorem 13.21 for the relative entropy, we prove the first 
variation formula for the Wasserstein distance along heat flow (along the line of [AGS, 
Section 10.2]). 

Proposition 5.7 (First variation formula for W% along heat flow) For any global 
solutions (pt)t>o, (o~t)t>o C ffo(R n ) to the reverse heat equation (I5.5P such that p t = 
p t dx G Pf c (R n ) and u t = o t dx G Pf c (R n ) ; we have 

Am = / g^iuJi, u t ) du t - / g^iuJo, pt) dp t (5.7) 

Tit Z[T t) J R „ J R n 

for all t > 0, where u : [0, 1] — > 7-2 (K n ) is the minimal geodesic from p t to v t . 

Proof. Set lit) := W2(pt,v t ), fix 5 > and define T as the set of continuous curves 
£ : [t,t + 8] — > R n endowed with the uniform topology. For r G [t, t + 8], we define the 
evaluation map e T : T — > R n by e r (£) := £(t). Then there exist probability measures 
IT, S G V(T) such that (e T )jjIT = p T , (e T )jjS = v T for all r G [i, t + 5] and that IT, H are 
concentrated on the set of C 1 -curves £, £ solving 

{( T) = M( £(T) ), «r) = ^%M) 

Pr 

for all r G [£, i + 5], respectively (cf. [AGS] Theorem 8.2.1]). We remark that p T ,cr T > 
for r > by the lower Gaussian estimate. 



14 



To see '<' of ( 15. IJ) , we disintegrate II and H by using /i t and ^ as (ill = dTI*(l// t (x) 
and dE = dE? y dv t (y) , where II*, 5* £ 'P(r) concentrate on the sets e^x) and eJT 1 (y), 
respectively. Take the unique minimal geodesic u : [0, 1] — > T^O^™) from fi t to z/ t , and 
let 7T t be the unique optimal coupling of /if and z/ t . Then we find, for each r £ [£,£ + $], 

/W 2 < / / J((T)-ttT)\\ 2 dUl(0dEl(0dn t (x,y). 

We deduce from the first variation formula (12.61) on the underlying space (IR n , || • ||) that 

lim ilC(r)-^)|| 2 -||C(t)-^)|| 2 

rit 2(r — t) 

= g«t)-m (CW - £(t), CW) - 9«t)- m (C(t) - £(tUV)) 
= (m - m, ^=%W) - ^=%(*)) 

V °t Pt 

for Il-a.e. £ and S-a.e. Since p and a are C 1 on (0, oo) x R n , this convergence is uniform 



on 



fi e := {x £ R n | ||a;|| < £ \ p t (x) > e, cr t (a;) > e} 



for each e > 0. In order to see that the effect of MJ 1 \ Q £ is negligible as e tends to zero, 
we observe from 

IIC(r)-e(r)|| 2 -||C(t)-eWir 

< (new -ewii + new -e(*)ii)(iic(r) -cm + new 

that 

'xi» Jrxr 



r 



This is finite uniformly in r £ (t, t + 5} , because 



1 / 'llC(r)-CWII 2 ^(c)< 7 A^ j[(jT||C(«)ll<fc) dS(C) 



Therefore we obtain 

\2 ;/ + \2 



l imsup A_2 LL< / / ^(y-^c^-eWjdl^COdSiCO^^y). 

r4i Z \ T ~ t ) JM. n xR n JTxr 
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Note that 



1 9uo(x) (^o(x)^(t)) <m t x {^)d^t{x) 
9u (uo, fit) dfi t . 



Hence we have 



l(T) 2 -l(t) 2 f . , f 

hm sup ——. — < / {ui, u t ) dv t - / (cu , fit) dfi t . 

Tit 6{T — t) J R n J R n 

To see the reverse inequality, we fix r G (t,t + 5), take the optimal coupling n T of fi T 
and is T , and disintegrate II and H as g?II = dH T x dfi T (x) and cG = dEydu T (y). Observe that 



' {IIC(r)-e(r)|| 2 -||C(t)-eWII^}^(0^;(C)^ T (x )2/ ). 
rxr 



/(r) 2 -Z(t) 2 > 

Since the function 

[0, 1] 3 s ||{(1 - s )C(t) + <(r)} - {(1 - s)£(t) + S £(t)}|| 2 
is convex, the first variation formula (I2.6P (at s = 0) yields that 

HC(r) - e(r)|| 2 - ||C(t) - m\\ 2 > l9«t)-m (CW - £(*), {C(r) - C(*)} - {£(r) - £(*)})■ 
Thus we find 
/(r) 2 -/(t) 2 



2(r-t) 



> 



rxr 



_ „»«t)-c(t) C(*)-e(*), 



C(r)-C(i) £(r)-£(f) 



r - i 



r-i 



dn;(0rfHI(C)rf7r T (x,y). 



Recall that (£(r) — £(£))/(t — converges to £(£) = [V(— fk) / Pt](€(t)) uniformly on £l £ . 
Moreover, 



dirT := (e t x e t ) 



c?iI^(iSy(i7r r (x, y) 



n vICP" 



G P 2 (M n x R n ) 



weakly converges to ir t as r 1 £ due to [AGS, Lemma 10.2.8]. Precisely, as 7r^ is a coupling 
of /if and v ty the family {7r^} r e(t,t+5) is relatively compact (cf. |AGS| Remark 5.2.3]). 
Combining this with the simple estimate 

1/2 

\y - x\\ 2 d7rl(x,y) 



< 



1/2 



||C(t)-C(r)|| 2 rfH(C) + W 2 (fi T ,v T ) + U(r) ~ m\\ 2 dll(0 



1/2 



-»• W 2 (^, ft) (T|t) 

and the uniqueness of the optimal coupling n t , we see that the limit of must be ir t . 
Therefore we obtain 



liminf 'w;-'f > 

rit 2(r - 1) 
and complete the proof. 



(/,,_,. I y - x, V °* (y) - ^ P * (g) ) ^(x, y) 



0"t 



□ 
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Now, the following is shown in a similar way to Theorem [3? 

Proposition 5.8 (Skew convexity versus Wasserstein contraction) For K G M, 

the following are equivalent. 

(I) The relative entropy is K-skew convex in the sense that, for any p = pdx G V^ c (W n ) 
such thatpe £r 1 (l n )nC 1 (I B ) ) Ent(^) < oo ; ||V(-p)/p|| G L 2 (W n ,p) and for any 
v = a dx G 7 7 | c (lR n ) satisfying the same conditions, it holds 



/ 5w(wi, Vw(-Ent)(i/)) ^ - / (<^o, Vw(-Ent)()Lt)) d// < -KW 2 (ii, 

(5.8) 

where u : [0, 1] — > 7-2 (JR™) is t/ie minimal geodesic from p to v. 

(II) The reverse heat flow is K-contractive in the sense that, for any global solutions 
(pt)t>o, (o~t)t>o C HoiW 1 ) to the reverse heat equation such that \i t := p t dx G 
P 2 ac (R") and u t := cr t dx G P 2 ac (M n ) ; we fcawe 

W 2 (/it, v t ) < e- Kt W 2 (fi , vq) for all t G [0, oo). (5.9) 

We remark that, in the implication (II) =>■ (I), we need the C 1 -regularity of p and 
a to apply (the proof of) Proposition 15.71 at t — 0. Indeed, as each spatial derivative 
v := dp/dx k (k = 1,2, ... , n) again solves the linear parabolic equation c^u = A v (~ p )f 
( |OSlj Lemma 4.7]), the upper Gaussian estimate for the fundamental solution ensures 
that dp t /dx k tends to dpo/dx k locally uniformly. 

As a corollary to Proposition 15.81 we obtain the 0-contraction of gradient curves in a 
special class of symmetric measures (compare this with Step M in the next section). 

Corollary 5.9 (Non-expansion for Gaussian measures) Let (W 1 , || ||) be asymmet- 
ric normed space {i.e., || — x\\ = \\x\\). Take two probability measures of Gaussian form 

dp (x) = CaT n l 2 exp (- ^ dx, du (x) = Cb- n l 2 exp ^- dx 

for some a,b > 0, y, z G lR n and the normalizing constant C > 0. Then the gradient 
curves (pt)t>o, (vt)t>o o/Ent starting from them satisfies W2([H,vt) < H^d^o^o) /or a// 
t > 0. 

Proof. Without loss of generality, we assume y = and a > b. Solving the heat equation, 
we observe 

dp t (x) = C(t + a)- n/2 exp ^- J^—^ jdx, du t (x) = C(t + b)- n/2 exp ^- t-ijdx. 

Note also that the unique minimal geodesic (w a )se[o,i] from // t to u t is given by (T s )$p t , 
where 
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We can explicitly write as 

du s {x) = C(t + (1 - s)a + S 6)- n/2 exp (- ^ - ^ )cfa. 

It follows from Lemma 15.41 that 

/ flto, (w a , Vht(- Ent)(a; s )) du s 

JR n 

gT x {x)-x(Ti{x) -x, [Vw(-Ent)(w a )](T,(a:))) dp t (x) 

/ 5'Ti( a! )-x(7i(ic) - - sz) dp t (x) 

JR" 

9t 1 (x)-x(x - Ti(x),x) dfit(x). 



2(t + (l- s)a + sb) 
(l-s) + sy/(t + b)/(t + a) 
~ 2(t + (1 - s)a + sb) 

Observe that the coefficient of the last line is non- increasing in s. Hence it suffices to show 
that 6 := L B gr^-xix — Ti(x),x) dp t (x) (which is independent of s) is nonnegative. If 
a = b, then we find Ti(x) — x = z and = by the symmetry of p t . If a > b, then we 
put z' = s'z = {1 — y/Jt + b)/(t + a)} _1 z and deduce T s i(x) = 2/ (note that s' > 1). Thus 
we have 

0T 1 (*)- ai (2; - = —g z '- x {x - z',x) = ^-j[D(\\z' - -|| 2 )(x)](a;), 

and [D(\\z f - -\\ 2 )(x)](x) + [D(\\z' - -^(-x^-x) > by the convexity of ||-2'--|| 2 (along 
with the symmetry of || • ||). Therefore we obtain > 0, and Proposition 15.81 completes 
the proof. □ 



6 Non-contraction of heat flow 

This section is devoted to a proof of Theorem ll.il For notational simplicity, we prove this 
for the reverse norm. That is to say, global solutions to the reverse heat equation (15.31) 
are not if-contractive with respect to W^. 

Fix p = pdx G P 2 ac (M n ) such that p G H^(R n ) n C^R"), Ent(p) < 00 and that 
||V(-p)||/p G L 2 (M n ,/i). For T > 1, we set 



0J S 



p s dx := (Ts/t)^, F s /t{x) ■= fl- — Jx for s G [0,T]. 



Then (u} s )se[o,T\ is the unique minimal geodesic from p to the Dirac measure 80 at the 
origin O G M", and its tangent vector field co s is simply given by cj s (x) = —x/(T — s). 
Put v — oji. We will show that (15.81) is false for any given K G M. (i.e., Ent is not i^-skew 
convex) by choosing suitable p. 

We deduce from Vw{~ Ent)(o; s ) = V(—p s )/p s (Lemma I5T4D that 

/ gw a {ws, Vw(-Ent)(w s )) du s = — -i— / g- x {- x, V(-p s )(x)) dx. 

JR n 1 s Jsuppp s 
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It follows from p s (x) = (T/(T — s)) n p(Tx/(T — s)) and the change of variables formula 
that 



g- x (x, V(-p s )(x)) dx 



supp p° 



9-x \ x 



supp p 



T 



n+1 



T-s 
g„ x (x,V{-p){x)) dx. 



V(-p) 



Tx 
T-s 



dx 



supp p 



Thus we have 



/ 9cj 3 (w 8 , Viy(- Ent) (u a )) du s = [ 9-x(- x, V(-p)(x)) dx, 

it" 1 s J supp p 



and hence 
d_ 

ds 



[ 9u s (u s , Vw(-Ent)(w 8 )) du a = j— J 9- x (- x, V(-p)(x)) 



dx. 



Note that W 2 (oJo,u)i) 2 = T 2 f Rn \\ — x\\ 2 p(x) dx and, by putting f(x) := \\ — x\\ 2 /2 (as in 
Section HJ), 

g- x (-x,V(-p)(x)) = [D(-f)(x)](V(-p)(x)). 

We set 

L ppp [D(-f)(x)}(V(-p)(x))dx 



Q(p) :- 



(6.1) 



Jjgn || x|| 2 p(x) c?x 

and shall demonstrate that Q(p) can be positive (Steps [UH2] below) and even arbitrarily 
large (Step [3]) by choosing suitable p, unless || • || is an inner product. This means that 
f 1 5 . 8 j) is false for any K G R, and completes the proof of Theorem 11.11 We start with an 
explicit example describing the heart of our construction. 

Step (The model case of i 2 with 2 < p < oo) Let || ■ || be the £ p -norm of R 2 such 
that 2 < p < oo. Take the unit vectors a = (-1, 0), b = (2^, 2~ 1 / p ), c = {2' 1 / p , -2~^ p ) 
and let AABC be the triangle tangent to the unit sphere of || • || at a, b, c. Precisely, 
A = (2 1 " 1 / p , 0), B = (-1, -1 - 2 1 - 1 / p ) and C = (-1, 1 + 2 1 ~ 1 / p ) (Figure 3). 



C(-l, 1 + 2 1 - 1 /?) 



a(-l,0) 




6(2- 1 / p ,2- 1 / p ) 
A(2 1 " 1 /p,0) 



c(2- 1 /f, -2-Vp) 



Figure 3 
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Define the nonnegative function p : ¥L 2 — > [0, oo) by p := outside AABC and by 
p(tx) := (1 — t)a for a point x on the edges of AABC and for t e [0, 1], where the constant 
a > is chosen so that L 2 pdx = 1. Note that the gradient vector V(— p) is cr-a = (—a, 0) 
inside AOBC, a-b= (2~ 1 ^<r, 2- 1 / p a) inside AOAC, and a ■ c = (2-^ p a, -2-Vjv) inside 
Hence we have 

V(-p) rfx = (1 + 2 1 - 1 / p ) • {-a, 0) + 2 1 - 1 / p (l + 2 1 - 1 ' p ) ■ {2~ 1 ' p a, 0) 

= (l + 2 1 - 1/ >-(2 1 - 2/p -l,0). (6.2) 

Note that 2 1 - 2 ' p - 1 > since p > 2. 

Now, for large R > 1, we consider the function Pr(x) := p(x + (i?, 0)). Then it follows 
from (16. 2p and V(— f)(x) = —x that 

lim [ [ J D^Vx)l(V(-p R )(x))rfx = (l + 2 1 - 1/p )(2 1 - 2/p -l)a>0. 

Therefore, by taking a smooth approximation of p^ (satisfying the conditions imposed on 
p at the beginning of the section) for sufficiently large R, we find p satisfying 0(p) > 0. 

Step 1 (General two-dimensional case) The argument in Step M shows the following 
claim. We will denote by S(l) the unit sphere of the norm || ■ ||. 

Claim 6.1 Suppose that a Minkowski space (R 2 , || • ||) admits a triangle AABC such that 
edges AB, BC,CA are tangent to S(l) at points c,a,b, respectively, and that the vector 

\AOAB\ ■ c+ \AOBC\ ■ a+ \AOCA\ ■ b (6.3) 

is nonzero, where \AOAB\ denotes the area of AOAB with respect to the Lebesgue mea- 
sure. Then there exists a function p for which @(p) as in (16. ip is positive. 

Note that (16.31) is always zero in inner product spaces. (Indeed, for the standard inner 
product, it holds that (\AOAB\ ■ c + \ AOBC\ ■ a + \ AOCA\ ■ b, ei ) = for e x = (1,0) 
and C2 = (0, 1) by the fundamental theorem of calculus applied to the function p defined 
as in Step[DJ) Claim IBTTl is sharp enough for our purpose, as we can certainly verify the 
following. 



Claim 6.2 There exists a triangle AABC satisfying the condition in Claim IBTTI unless 
|| • || is an inner product. 

Although this claim should be a known fact (and there would be a simpler proof), 
we give a proof for completeness. We first treat the easier case of nonsymmetric norms. 
Choose a pair a,b G S(l) such that b = —\a with A ^ 1. If the tangent lines of S(l) at a 
and b are not parallel in M 2 , then we draw a triangle AABC in such a way that the edge 
AB is parallel to ba. As the vectors a and c are linearly independent, (16. 3p is not zero. 
In the other case where the tangent lines of S(l) at a and b are parallel, we take C such 
that OC is parallel to these tangent lines, and draw the triangle AA'B'C such that A'B 1 
is parallel to b'a' (a' and b' are determined only by C). By letting C go to infinity, a' 
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and b' can be arbitrarily close to a and b, respectively. Then we observe that \AOA'B'\ 
is much smaller than \AOB'C'\ and \AOC'A'\, and that the ratio \AOC'A'\/\AOB'C'\ 
is close to A. Thus we have 

\AOA'B'\ ■ c' + \AOB'C'\ ■ a' + \AOC'A'\ ■ b' « \A0B'C'\{1 - A 2 ) • a + 0. 

Next we consider symmetric norms. We suppose that the sum ( 16. 3 p is always zero, 
and will see that || • || must be an inner product. Take a,b G S(l) with b = —a such that 
\a\ = sup x£S ^ \x\, where | ■ | is the Euclidean norm. Then the tangent lines of S(l) at 
a and b are perpendicular to ab with respect to the Eulidean inner product. As in the 
nonsymmentric case, we take C so that OC is parallel to these tangent lines, and consider 
the triangle AA'B'C for some fixed c. Let C diverge to infinity and denote the limits of 
A', B' by A, B. Then our hypothesis yields that the vector 

\AOaB\- a+\AOAb\-b+\AOAB\- c (6.4) 

is independent of the choice of c on the arc between a and b opposite to C (since Aa'b'C 
corresponding to AA'B'C was independent of the choice of c). We will see that this is the 
case only for inner products. For simplicity, we assume that a = (—1,0), b = (1,0) and 
that c is in the upper half plane. Define the function h : [—1, 1] — > [0, 1] by || (t, h{t)) \\ = 1, 
and compare this with the function h : [—1, 1] — > [0, 1] such that {(£, h(t))} te[-i,i] draws 
(the upper half of) the ellipse having ab and OD as its long and short axes, where 
-Do = (0, sup/t) (Figure 4). We first suppose that sup h is attained at t > 0, and put 
Co = (^o, h(t )), A = (1, h(t )) and B = (—1, h(t )). Then, on the one hand, clearly the 
y-components of the vectors | AOA B \ ■ c and | AOA B \ ■ D are the same. On the other 
hand, since only c has a nonzero y-component in (16 .4p . |AOt4I?| • c and \AOA B \ ■ cq 
have the same ^-component. Similarly, for any t' G (—1,1) and points D := (t',h(t')), 
A := (l,h(tf) + (1 - t')~h'J£)) and B := (-1,^) - (1 + t')h'{t')) corresponding to the 
ellipse drawn by h, \AOAB\ ■ D has the same ^-component as | AOAqBq\ ■ Dq. Hence we 
have 

\AOAB\ ■ c - \AOAB\ ■ D G R x {0}. 
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In particular, for any c = (t, h(t)) with t G (to, 1) an d t' G (0, 1) with h{t') = h(t), we 
obtain |AOAB| = |AOAB| and hence h'(t') < h'(t) (more precisely, AB and AB must 
intersect on the y-axis). However, this is a contradiction since h(l) = h(l) = 0. We 
similarly derive a contradiction from to < 0, so that to = 0. Furthermore, h must coincide 
with h everywhere by a similar discussion. Therefore || • || is an inner product and we 
complete the proof of Claim 16.21 

Step 2 (rz-dimensional case with n > 3) Suppose that (R n , || • ||) is not an inner prod- 
uct space. Then there is a two-dimensional subspace P C R n in which the restriction of 
|| • || is not an inner product. We assume P = {(x, y, 0, . . . , 0) | x, y G R} for brevity, and 
sometimes identify this with R 2 . By StepHJ there is a function p R : (R 2 , || • |||p) — > [0, oo) 
such that J R2 prcIx = 1, supp p R C B((—R, 0), r) for some fixed r > and that 



lim 



(V(-p R )(x))dx>0, 



suppp H L 

where we set B(z, r) := {w G R 2 | \\w — z\\ < r} for z G R 2 and r > 0. Using a smooth cut- 
off function tj r : R" -2 — > [0, oo) such that r) R = 1 on B(0, a/R), supp t] R C B(0, y/R+1) 
and that sup \\V(-r] R ) \\ < 2, define p : (R n , || • ||) — ► [0, oo) by 

p(x,y) ( / VR dz ) Pr{x)Vr{v) 

for x G R 2 and y G R n " 2 . Note that V(-p)(x,y) = {J wl ^r] R dz)- 1 ■ V{-p R ){x) for 
y G -6(0, y/R) C R n_2 . Hence we have B(p) > since the effect of the boundary of the 
cut-off is negligible for large R. Indeed, we observe that 

\b(o, Vr + i)\ b(o, Vr)\ ■ ( I VR dz) =o((VR) n - 3 /(VR) n - 2 ) ->o 

as R goes to infinity. 

Step 3 (Scaling) Suppose that there is p with 0(p) > 0, and set p e (x) := e^ n p(e^ 1 x) 
for e > 0. Then we have 

/ [D(-f) (x))(V(-p £ )(x)) dx = f |D(-/)(x)](V(-p)(5- 1 a;)) dx 

J supp p F J supp p e 

[£>(-/)(«)] (V(-p)(x))dx 



supp p 



and 

\\ — x\\ 2 p £ (x) dx = e~ n / || — x\\ 2 p(e^ 1 x) dx = e 2 I \\ — x\\ 2 p(x) dx. 



Therefore 0(p e ) = e 2 Q(p) and it diverges to infinity as e tends to zero. Thus we complete 
the proof of Theorem 11.11 
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Appendix: Skew convexity of distance functions on 
Finsler manifolds 



We finally investigate the skew convexity of squared distance functions on Finsler mani- 
folds. This would be of independent interest from the geometric viewpoint. The convexity 
of distance functions is closely related to upper bounds of the sectional curvature in the 
Riemannian case. In our Finsler setting, we need two more quantities to control the dis- 
tance function. See |Sh] and [Oh2j for related work on the usual convexity and concavity 
along geodesies. 

Let (M, F) be a C°°-Finsler manifold. We introduce some terminologies for which we 
refer to [BCSJ. For a C 1 -vector field X on M and tangent vectors v, w £ T X M with w^O, 
we define the covariant derivative of X by v with reference vector w as 

n ( c)X i n 1 8 

(d:x)( X ) ■.= y: v>qj{x) +Y. v U™yx\x) 

i,j=l ^ k=l ■* 

where Y l - k is the Christoffel symbol. If Y l - k {w) depends only on the point x (i.e., indepen- 
dent of the choice of w £ T X M \ {0}) for all x £ M, then we say that (M, F) is of Berwald 
type. In a Berwald space, all tangent spaces are isometric to each other. For instance, 
Riemannian manifolds and Minkowski spaces are of Berwald type. 

By using the covariant derivative, the geodesic equation is written in a canonical way 
as /Xy7 = 0. We will use the following formula borrowed from |BCS1 Exercise 10.1.2]: 

j t g v (V, W) = g v {D^V, W) + g v {V, D^W) (A.l) 

for any C """-curve 7 and C 1 -vector fields V, W along 7 such that V ^ 0. 

A C°°- vector field V along a geodesic 7 : [0, /] — > M is called a Jacobi field if it 
satisfies the equation DlD^V + K(V, 7)7 = 0, where K : TM ® TM — ► T*M ® TM 
is the curvature tensor. Similarly to the Riemannian case, the variational vector field of 
a geodesic variation is a Jacobi field (and vice versa). For linearly independent vectors 
v,w £ T X M, the flag curvature is defined by 

g v (K(w,v)v,w) 

K,[v, w) ■ 



F(v) 2 g v (w,w) - g v (v,w) 



We remark that K.(v,w) depends not only on the plane in T X M spanned by v and w 
{flag), but also on the choice of v in it (flagpole). 

In order to state our theorem, we introduce the condition 

9v{V, D V W D V W V - DZDZV) > -5F{VfF{W) 2 (A.2) 

for non- vanishing C°°- vector fields V, W and 5 > 0. This clearly holds with 5 = for 
Berwald spaces. Therefore 5 measures how the tangent spaces are distorted as one moves 
(in M) along W. The injectivity radius inj(z) at z £ M is the supremum of R > such 
that any unit speed geodesic 7 : [0, R) — > M with 7(0) = z contains no cut point of z. 
We set B(x, r) := {y £ M \ d(x, y) < r} for x £ M and r > 0. 
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Theorem A.l Let (M, F) be a forward complete Finsler manifold and suppose that /C < 
k, S < S and (1A.2|) hold for some k > 0, S > 1 and 5 > 0. Then the function 
f(x) := d(x,z) 2 /2 is K(k, S, 5,r)-skew convex in J3(z,r) for all z e M and r e (0,i2), 
where we set 



K(k, S, S,r):= VkS 2 + Sr ■ cot(V kS 2 + Sr) 



and R := min{mj(,2), n/y/kS 2 + 8}. In particular, if K. < 0, then f is {yfSr cot(\/~8r))- 
skew convex in B(z,r) for r 6 (0, niin{mj(,z), n/y/5}) regardless S. 

Proof. Fix a unit speed minimal geodesic 7 : [0,/] — > B(z,r) with r < R } and let 
a : [0, 1] x [0, 1] — > M be the C°°-variation such that o~ s := o~(s, •) is the unique minimal 
geodesic from 7(5) to z. Put T(s,t) := d t a(s,t) and V(s,t) := d s a(s,t). Observe that 
j(s) = V(s, 0) and V(— f)(j(s)) = T(s, 0). Hence we need to bound the following: 

A[sv(VM),TM))] =g v {V(s,0),D^T(s,0)). (A.3) 

We used flAl]) and the geodesic equation D]fV(s, 0) = 0. As D V S T = DfV (cf. [ SCSI 
Exercise 5.2.1]), we deduce from ( lA.ip that 

^(V( S ,0),^T( S ,0)) = ~[^(V,V)]( a ,0) = 13?233( S ,0). (A.4) 
Again due to (lA.lj) . we observe 

d 2 [F(v)] _ d r^ v (v,A v v) 



dt 2 dt 



F(V) 



5v(v, A v A v v) + sv(A v v, A v v) pv(v, A v v) 2 



F(V) F(V) 3 

^ v (v,A v A v v) , f(v)^v(^v,a v v)-^(v,A v v) 2 

F(V) F(V) 3 

The second term is nonnegative by the Cauchy-Schwarz inequality. Moreover, by the 
assumption ( 1A.2|) . we have 

g v (V, A V A V V) > 9v(V, DjDjV) - 5F(V) 2 F(T) 2 . 
Since V(s, •) is a Jacobi field, it holds Dj DjV = —1Z(V, T)T and hence 



2 



> _ ^(TlV(v,v)- g r(r,v)^ _ 

F(VJ 

As jfe > 0, it follows from S < S that (recall (EHJ)) 

-fc{F(T)V(V,V) - gr(T,V) 2 } > -kF(T) 2 g r (V,V) > -kS 2 F{V) 2 F{T) 2 . 
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Hence we obtain, together with F(T) < r, 



[F(V)] 



> -(kS 2 + 5)r 2 F{V). 



dt 2 



The above inequality shows that the function 



d[F(V)] 



sm 



(VkS 2 + Sr(l - £)) -F(V) — [sin(v / ^qr5 r (i_t))] 



<9f 



is non-decreasing in t G [0, 1], so that it is nonpositive for all t. Thus we have 




= F(V) 1 V Jl < -VkS 2 + 5r ■ cot (VkS 2 + 5r(l - t))F(V) 2 




(A.5) 



Combining (IP]) . f EO]) . (TP]) and F(V(s,0)) = F( 7 (s)) = 1, we conclude 



9s 



[sv(V(s,0),T(s,0))] < -VA;5 2 + 5r •cot(v / ^ 2 + 5r). 



This completes the proof. 



□ 



Interestingly enough, what appeared in Theorem IA.1I is not the 2-uniform convexity 
constant C, but the smoothness constant S. Compare this with the usual convexity in 
[Oh2l Theorem 5.1]. We finally state the Berwald case separately. 

Corollary A. 2 Let (M,F) be forward complete and of Berwald type and suppose that 
K, < k and S < S hold for some k > and S > 1. Then the function f(^c) := d(x, z) 2 /2 
is (VkSr cot(VkSr)) -skew convex in B (z, r) for all z £ M, r £ (0, min{inj(2;), n/VkS}). 
In particular, if K, < 0, then f is 1-skew convex in B (z, inj(z)) regardless S. 

This recovers the 1-skew convexity of f(x) = || — x\\ 2 /2 on Minkowski spaces in Sec- 
tion m 
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